A Visualization Tool for Mining Large Correlation Tables: The Association Navigator
نویسندگان
چکیده
The Association Navigator is an interactive visualization tool for viewing large tables of correlations. The basic operation is zooming and panning of a table that is presented in graphical form, here called a “blockplot”. The tool is really a tool box that includes, among other things: (1) display of p-values and missing value patterns in addition to correlations, (2) mark-up facilities to highlight variables and sub-tables as landmarks when navigating the larger table, (3) histograms/barcharts, scatterplots and scatterplot matrices as “lenses” into the distributions of variables and variable pairs, (4) thresholding of correlations and p-values to show only strong and highly significant p-values, (5) trimming of extreme values of the variables for robustness, (6) “reference variables” that stay in sight at all times, and (7) wholesale adjustment of groups of variables for other variables. The tool has been applied to data with nearly 2,000 variables and associated tables approaching a size of 2,000×2,000. The usefulness of the tool is less in beholding gigantic tables in their entirety and more in searching for interesting association patterns by navigating manageable but numerous and interconnected sub-tables.
منابع مشابه
A Tool for Mining Large Correlation Tables: The Association Navigator
The Association Navigator is an interactive visualization tool for viewing large tables of correlations. The basic operation is zooming and panning of a table that is presented in graphical form, here called a “blockplot”. The tool is really a tool box that includes, among other things: (1) display of p-values and missing value patterns in addition to correlations, (2) mark-up facilities to hig...
متن کاملStatistical Inference for Large Scale Data
We introduce a sparse and positive definite estimator of the covariance matrix designed for high-dimensional situations in which the variables have a known ordering. Our estimator is the solution to a convex optimization problem that involves a hierarchical group lasso penalty. We show how it can be efficiently computed, compare it to other methods such as tapering by a fixed matrix, and develo...
متن کاملEfficient visualization of security events in a large agent society
The paper describes the design and development of an efficient visualization tool called security console for monitoring security related events in a large agent society (CougaarTM). This administrative tool is primarily used to collect and process alert messages generated by various sensors across the distributed agent society. This tool exploits the agents’ hierarchical structural for aggrega...
متن کاملKernel Methods and Visualization for Interval Data Mining
We propose to use kernel methods and visualization tool for mining interval data. When large datasets are aggregated into smaller data sizes we need more complex data tables e.g interval type instead of standard ones. Our investigation aims at extending kernel methods to interval data analysis and using graphical tools to explain the obtained results. The user deeply understands the models’ beh...
متن کاملFrom Big Data to Smart Data Teaching Data Mining and Visualization
The most important part of big data processing is to create knowledge by converting data into smart data. Smart data necessitates both data mining and visualization techniques. We believe it is important to cover these concepts in an undergraduate computer science curriculum and thus have been teaching a data mining and visualization course for several years. In visualization we concentrate on ...
متن کامل